home
***
CD-ROM
|
disk
|
FTP
|
other
***
search
/
Turnbull China Bikeride
/
Turnbull China Bikeride - Disc 2.iso
/
STUTTGART
/
TEMP
/
GNU
/
flex
/
Startcondi
< prev
next >
Wrap
Text File
|
1995-06-28
|
10KB
|
392 lines
Start conditions
Previous: <Generated scanner=>Generateds> * Next: <Multiple buffers=>Multiplebu> * Up: <Top=>!Root>
#Wrap on
{fH3}Start conditions{f}
{fCode}flex{f} provides a mechanism for conditionally activating
rules. Any rule whose pattern is prefixed with "<sc>"
will only be active when the scanner is in the start
condition named "sc". For example,
#Wrap off
#fCode
<STRING>[^"]\* \{ \/\* eat up the string body ... \*\/
…
\}
#f
#Wrap on
will be active only when the scanner is in the "STRING"
start condition, and
#Wrap off
#fCode
<INITIAL,STRING,QUOTE>\\. \{ \/\* handle an escape ... \*\/
…
\}
#f
#Wrap on
will be active only when the current start condition is
either "INITIAL", "STRING", or "QUOTE".
Start conditions are declared in the definitions (first)
section of the input using unindented lines beginning with
either {fEmphasis}%s{f} or {fEmphasis}%x{f} followed by a list of names. The former
declares {fEmphasis}inclusive{f} start conditions, the latter {fEmphasis}exclusive{f}
start conditions. A start condition is activated using
the {fCode}BEGIN{f} action. Until the next {fCode}BEGIN{f} action is
executed, rules with the given start condition will be active
and rules with other start conditions will be inactive.
If the start condition is {fEmphasis}inclusive{f}, then rules with no
start conditions at all will also be active. If it is
{fEmphasis}exclusive{f}, then {fEmphasis}only{f} rules qualified with the start
condition will be active. A set of rules contingent on the
same exclusive start condition describe a scanner which is
independent of any of the other rules in the {fCode}flex{f} input.
Because of this, exclusive start conditions make it easy
to specify "mini-scanners" which scan portions of the
input that are syntactically different from the rest
(e.g., comments).
If the distinction between inclusive and exclusive start
conditions is still a little vague, here's a simple
example illustrating the connection between the two. The set
of rules:
#Wrap off
#fCode
%s example
%%
<example>foo do\_something();
bar something\_else();
#f
#Wrap on
is equivalent to
#Wrap off
#fCode
%x example
%%
<example>foo do\_something();
<INITIAL,example>bar something\_else();
#f
#Wrap on
Without the {fEmphasis}<INITIAL,example>{f} qualifier, the {fEmphasis}bar{f} pattern
in the second example wouldn't be active (i.e., couldn't match) when
in start condition {fEmphasis}example{f}. If we just used {fEmphasis}<example>{f}
to qualify {fEmphasis}bar{f}, though, then it would only be active in
{fEmphasis}example{f} and not in {fCode}INITIAL{f}, while in the first example
it's active in both, because in the first example the {fEmphasis}example{f}
starting condition is an {fEmphasis}inclusive{f} ({fEmphasis}%s{f}) start condition.
Also note that the special start-condition specifier {fEmphasis}<\*>{f}
matches every start condition. Thus, the above example
could also have been written;
#Wrap off
#fCode
%x example
%%
<example>foo do\_something();
<\*>bar something\_else();
#f
#Wrap on
The default rule (to {fEmphasis}ECHO{f} any unmatched character) remains
active in start conditions. It is equivalent to:
#Wrap off
#fCode
<\*>.|\\\\n ECHO;
#f
#Wrap on
{fEmphasis}BEGIN(0){f} returns to the original state where only the
rules with no start conditions are active. This state can
also be referred to as the start-condition "INITIAL", so
{fEmphasis}BEGIN(INITIAL){f} is equivalent to {fEmphasis}BEGIN(0){f}. (The
parentheses around the start condition name are not required but
are considered good style.)
{fCode}BEGIN{f} actions can also be given as indented code at the
beginning of the rules section. For example, the
following will cause the scanner to enter the "SPECIAL" start
condition whenever {fEmphasis}yylex(){f} is called and the global
variable {fCode}enter\_special{f} is true:
#Wrap off
#fCode
int enter\_special;
%x SPECIAL
%%
if ( enter\_special )
BEGIN(SPECIAL);
<SPECIAL>blahblahblah
…more rules follow…
#f
#Wrap on
To illustrate the uses of start conditions, here is a
scanner which provides two different interpretations of a
string like "123.456". By default it will treat it as as
three tokens, the integer "123", a dot ('.'), and the
integer "456". But if the string is preceded earlier in
the line by the string "expect-floats" it will treat it as
a single token, the floating-point number 123.456:
#Wrap off
#fCode
%\{
\#include <math.h>
%\}
%s expect
%%
expect-floats BEGIN(expect);
<expect>[0-9]+"."[0-9]+ \{
printf( "found a float, = %f\\n",
atof( yytext ) );
\}
<expect>\\n \{
\/\* that's the end of the line, so
\* we need another "expect-number"
\* before we'll recognize any more
\* numbers
\*\/
BEGIN(INITIAL);
\}
[0-9]+ \{
Version 2.5 December 1994 18
printf( "found an integer, = %d\\n",
atoi( yytext ) );
\}
"." printf( "found a dot\\n" );
#f
#Wrap on
Here is a scanner which recognizes (and discards) C
comments while maintaining a count of the current input line.
#Wrap off
#fCode
%x comment
%%
int line\_num = 1;
"\/\*" BEGIN(comment);
<comment>[^\*\\n]\* \/\* eat anything that's not a '\*' \*\/
<comment>"\*"+[^\*\/\\n]\* \/\* eat up '\*'s not followed by '\/'s \*\/
<comment>\\n ++line\_num;
<comment>"\*"+"\/" BEGIN(INITIAL);
#f
#Wrap on
This scanner goes to a bit of trouble to match as much
text as possible with each rule. In general, when
attempting to write a high-speed scanner try to match as
much possible in each rule, as it's a big win.
Note that start-conditions names are really integer values
and can be stored as such. Thus, the above could be
extended in the following fashion:
#Wrap off
#fCode
%x comment foo
%%
int line\_num = 1;
int comment\_caller;
"\/\*" \{
comment\_caller = INITIAL;
BEGIN(comment);
\}
…
<foo>"\/\*" \{
comment\_caller = foo;
BEGIN(comment);
\}
<comment>[^\*\\n]\* \/\* eat anything that's not a '\*' \*\/
<comment>"\*"+[^\*\/\\n]\* \/\* eat up '\*'s not followed by '\/'s \*\/
<comment>\\n ++line\_num;
<comment>"\*"+"\/" BEGIN(comment\_caller);
#f
#Wrap on
Furthermore, you can access the current start condition
using the integer-valued {fCode}YY\_START{f} macro. For example, the
above assignments to {fCode}comment\_caller{f} could instead be
written
#Wrap off
#fCode
comment\_caller = YY\_START;
#f
#Wrap on
Flex provides {fCode}YYSTATE{f} as an alias for {fCode}YY\_START{f} (since that
is what's used by AT&T {fCode}lex{f}).
Note that start conditions do not have their own
name-space; %s's and %x's declare names in the same fashion as
\#define's.
Finally, here's an example of how to match C-style quoted
strings using exclusive start conditions, including
expanded escape sequences (but not including checking for
a string that's too long):
#Wrap off
#fCode
%x str
%%
char string\_buf[MAX\_STR\_CONST];
char \*string\_buf\_ptr;
\\" string\_buf\_ptr = string\_buf; BEGIN(str);
<str>\\" \{ \/\* saw closing quote - all done \*\/
BEGIN(INITIAL);
\*string\_buf\_ptr = '\\0';
\/\* return string constant token type and
\* value to parser
\*\/
\}
<str>\\n \{
\/\* error - unterminated string constant \*\/
\/\* generate error message \*\/
\}
<str>\\\\[0-7]\{1,3\} \{
\/\* octal escape sequence \*\/
int result;
(void) sscanf( yytext + 1, "%o", &result );
if ( result > 0xff )
\/\* error, constant is out-of-bounds \*\/
\*string\_buf\_ptr++ = result;
\}
<str>\\\\[0-9]+ \{
\/\* generate error - bad escape sequence; something
\* like '\\48' or '\\0777777'
\*\/
\}
<str>\\\\n \*string\_buf\_ptr++ = '\\n';
<str>\\\\t \*string\_buf\_ptr++ = '\\t';
<str>\\\\r \*string\_buf\_ptr++ = '\\r';
<str>\\\\b \*string\_buf\_ptr++ = '\\b';
<str>\\\\f \*string\_buf\_ptr++ = '\\f';
<str>\\\\(.|\\n) \*string\_buf\_ptr++ = yytext[1];
<str>[^\\\\\\n\\"]+ \{
char \*yptr = yytext;
while ( \*yptr )
\*string\_buf\_ptr++ = \*yptr++;
\}
#f
#Wrap on
Often, such as in some of the examples above, you wind up
writing a whole bunch of rules all preceded by the same
start condition(s). Flex makes this a little easier and
cleaner by introducing a notion of start condition {fUnderline}scope{f}.
A start condition scope is begun with:
#Wrap off
#fCode
<SCs>\{
#f
#Wrap on
where SCs is a list of one or more start conditions.
Inside the start condition scope, every rule automatically
has the prefix {fEmphasis}<SCs>{f} applied to it, until a {fEmphasis}\}{f} which
matches the initial {fEmphasis}\{{f}. So, for example,
#Wrap off
#fCode
<ESC>\{
"\\\\n" return '\\n';
"\\\\r" return '\\r';
"\\\\f" return '\\f';
"\\\\0" return '\\0';
\}
#f
#Wrap on
is equivalent to:
#Wrap off
#fCode
<ESC>"\\\\n" return '\\n';
<ESC>"\\\\r" return '\\r';
<ESC>"\\\\f" return '\\f';
<ESC>"\\\\0" return '\\0';
#f
#Wrap on
Start condition scopes may be nested.
Three routines are available for manipulating stacks of
start conditions:
#Indent +4
#Indent
{fEmphasis}void yy\_push\_state(int new\_state){f}
#Indent +4
pushes the current start condition onto the top of
the start condition stack and switches to {fStrong}new\_state{f}
as though you had used {fEmphasis}BEGIN new\_state{f} (recall that
start condition names are also integers).
#Indent
{fEmphasis}void yy\_pop\_state(){f}
#Indent +4
pops the top of the stack and switches to it via
{fCode}BEGIN{f}.
#Indent
{fEmphasis}int yy\_top\_state(){f}
#Indent +4
returns the top of the stack without altering the
stack's contents.
#Indent
The start condition stack grows dynamically and so has no
built-in size limitation. If memory is exhausted, program
execution aborts.
To use start condition stacks, your scanner must include a
{fEmphasis}%option stack{f} directive (see Options below).